Missing data techniques for robust speech recognition
نویسندگان
چکیده
In noisy listening conditions, the information available on which to base speech recognition decisions is necessarily incomplete: some spectro-temporal regions are dominated by other sources. We report on the application of a variety of techniques for missing data in speech recognition. These techniques may be based on marginal distributions or on reconstruction of missing parts of the spectrum. Application of these ideas in the Resource Management task shows performance which is robust to random removal of up to 80% of the frequency channels, but falls off rapidly with deletions which more realistically simulate masked speech. We report on a vowel classification experiment designed to isolate some of the RM problems for more detailed exploration. The results of this experiment confirm the general superiority of marginals-based schemes, demonstrate the viability of shared covariance statistics, and suggest several ways in which performance improvements on the larger task may be obtained. The missing data problem arises naturally in many pattern recognition tasks [2,8] where elements of data vectors to be classified are unavailable during training and/or recognition. The causes of incomplete evidence include unreliable sensors, band-restricted data transmission (e.g. the spectral filtering action of a telephone channel), or partial occlusion of the desired pattern by an interfering signal. In the latter case, it is assumed that some preprocessor is able to determine which parts of the mixed observation correspond to the source to be classified. Our motivation for studying the missing data problem derives from ongoing studies at Sheffield and elsewhere [1] on computational auditory scene analysis (CASA), in which evidence for different sound sources is separated using auditory grouping principles. CASA is an attractive paradigm for robust ASR. It makes no assumptions about the type and number of acoustic sources which make up the mixture, and does not require prior exposure to these sources. However, separation will never be able to recover all the evidence: there will be some regions where other sound sources dominate. CASA-based robust ASR requires that the resulting missing data problem be confronted. In previous work [4,9] we demonstrated that it is possible to remove high proportions (up to 90%) of the input spectrum without significant deterioration in recognition rates. In ICASSP-95, we reported (using NOISEX) noise tolerance comparable to that of human listeners when only those spectro-temporal regions with a favourable local SNR were retained. Subsequently, we have applied missing data techniques to the Resource Management (RM) …
منابع مشابه
روشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کاملMissing data theory, spectral subtraction and signal-to-noise estimation for robust ASR: an integrated study
In the missing data approach to robust Automatic Speech Recognition (ASR), time-frequency regions which carry reliable speech information are identified. Recognition is then based on these regions alone. In this paper, we address the problem of identifying reliable regions and propose two criteria to solve this based on negative energy ( $ s < 0 ) and SNR ( $ s s n 2 2 2 < + ). These criteria a...
متن کاملRobust ASR based on clean speech models: an evaluation of missing data techniques for connected digit recognition in noise
In this study, techniques for classification with missing or unreliable data are applied to the problem of noise-robustness in Automatic Speech Recognition (ASR). The techniques described make minimal assumptions about any noise background and rely instead on what is known about clean speech. A system is evaluated using the Aurora 2 connected digit recognition task. Using models trained on clea...
متن کاملFrom Missing Data to Maybe Useful Data: Soft Data Modelling for Noise Robust Asr
Much research has been focused on the problem of achieving automatic speech recognition (ASR) which approaches human recognition performance in its level of robustness to noise and channel distortion. We present here a new approach to data modelling which has the potential to combine complementary existing state-of-theart techniques for speech enhancement and noise adaptation into a single proc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997